13. Demographic Analysis

Demographic Analysis

ND320 AIHCND C01 L01 A12 Demographic Analysis

Key Points

Important of Representative Data

Important of Representative Data

Demographic Analysis

The reason that demographic analysis is so important, especially in healthcare, is that we need our clinical trials and machine learning models to be able to representative to general population. While this is not always completely possible given limited trials and very rare conditions it is something we need to strive for and identify as early as possible if there may be an issue.

If we don't have a properly representative demographic dataset, we wouldn't know how a drug or prediction might impact a certain age, race or gender which could lead to significant issues for those not represented.

When completing a demographic analysis, it can be helpful to group data into buckets or bins.

In this walkthrough, we used np.arange() to create the bucket ranges, then used them to create the applies with a .join() method, and finally used pd.cut() to create our new "age_bins" buckets. You can use whatever methods you would like to complete this task. We also took the opportunity to change the sex or gender column from 0,1 to "male" and "female" to further breakdown our categories and demographics using replace(). Again this should all be a review, but we are just including here to be clear.

You may also use different age buckets/bins and see how the value distribution looks for those bins.

Additional Resources

Demographic Analysis

Reflect

QUESTION:

Why is it important to do a demographic analysis on your datasets?

ANSWER:

While there are many reasons for this you need to make sure that you datasets will be representative of the population you will be attempting serve. If it is not, the models created later will not generalize well in production.

Code

If you need a code on the https://github.com/udacity.